Search CORE

449 research outputs found

Shrewd Selection Speeds Surfing: Use Smart EXP3!

Author: Appavoo Anuja Meetoo
Gilbert Seth
Tan Kian-Lee
Publication venue
Publication date: 14/05/2018
Field of study

In this paper, we explore the use of multi-armed bandit online learning techniques to solve distributed resource selection problems. As an example, we focus on the problem of network selection. Mobile devices often have several wireless networks at their disposal. While choosing the right network is vital for good performance, a decentralized solution remains a challenge. The impressive theoretical properties of multi-armed bandit algorithms, like EXP3, suggest that it should work well for this type of problem. Yet, its real-word performance lags far behind. The main reasons are the hidden cost of switching networks and its slow rate of convergence. We propose Smart EXP3, a novel bandit-style algorithm that (a) retains the good theoretical properties of EXP3, (b) bounds the number of switches, and (c) yields significantly better performance in practice. We evaluate Smart EXP3 using simulations, controlled experiments, and real-world experiments. Results show that it stabilizes at the optimal state, achieves fairness among devices and gracefully deals with transient behaviors. In real world experiments, it can achieve 18% faster download over alternate strategies. We conclude that multi-armed bandit algorithms can play an important role in distributed resource selection problems, when practical concerns, such as switching costs and convergence time, are addressed.Comment: Full pape

arXiv.org e-Print Archive

Crossref

Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering

Author: Tan Kian Lee
Xiong Xuejian
Publication venue
Publication date: 01/01/2004
Field of study

In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized objective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The performance of this unsupervised fuzzy clustering algorithm is evaluated by several experiments of an artificial data set and a gene expression data set.Singapore-MIT Alliance (SMA

DSpace@MIT

Optimization of Analytic Window Functions

Author: Cao Yu
Chan Chee-Yong
Li Jie
Tan Kian-Lee
Publication venue
Publication date: 01/01/2012
Field of study

Analytic functions represent the state-of-the-art way of performing complex data analysis within a single SQL statement. In particular, an important class of analytic functions that has been frequently used in commercial systems to support OLAP and decision support applications is the class of window functions. A window function returns for each input tuple a value derived from applying a function over a window of neighboring tuples. However, existing window function evaluation approaches are based on a naive sorting scheme. In this paper, we study the problem of optimizing the evaluation of window functions. We propose several efficient techniques, and identify optimization opportunities that allow us to optimize the evaluation of a set of window functions. We have integrated our scheme into PostgreSQL. Our comprehensive experimental study on the TPC-DS datasets as well as synthetic datasets and queries demonstrate significant speedup over existing approaches.Comment: VLDB201

arXiv.org e-Print Archive

ScholarBank@NUS

PeerDB-Peering into Personal Databases

Author: Ooi Beng Chin
Tan Kian Lee
Publication venue
Publication date: 01/01/2003
Field of study

In this talk, we will present the design and evaluation of PeerDB, a peer-to-peer (P2P) distributed data sharing system. PeerDB distinguishes itself from existing P2P systems in several ways. First, it is a full-fledge data management system that supports fine-grain content-based searching. Second, it facilitates sharing of data without shared schema. Third, it combines the power of mobile agents into P2P systems to perform operations at peers' sites. Fourth, PeerDB network is self-configurable, i.e., a node can dynamically optimize the set of peers that it can communicate directly with based on some optimization criterion.Singapore-MIT Alliance (SMA

DSpace@MIT

Disseminating streaming data in a dynamic environment: an adaptive and cost-based approach

Author: Ooi Beng
Tan Kian-Lee
Zhou Yongluan
Publication venue
Publication date: 18/06/2018
Field of study

In a distributed stream processing system, streaming data are continuously disseminated from the sources to the distributed processing servers. To enhance the dissemination efficiency, these servers are typically organized into one or more dissemination trees. In this paper, we focus on the problem of constructing dissemination trees to minimize the average loss of fidelity of the system. We observe that existing heuristic-based approaches can only explore a limited solution space and hence may lead to sub-optimal solutions. On the contrary, we propose an adaptive and cost-based approach. Our cost model takes into account both the processing cost and the communication cost. Furthermore, as a distributed stream processing system is vulnerable to inaccurate statistics, runtime fluctuations of data characteristics, server workloads, and network conditions, we have designed our scheme to be adaptive to these situations: an operational dissemination tree may be incrementally transformed to a more cost-effective one. Our adaptive strategy employs distributed decisions made by the distributed servers independently based on localized statistics collected by each server at runtime. For a relatively static environment, we also propose two static tree construction algorithms relying on apriori system statistics. These static trees can also be used as initial trees in a dynamic environment. We apply our schemes to both single- and multi-object dissemination. Our extensive performance study shows that the adaptive mechanisms are effective in a dynamic context and the proposed static tree construction algorithms perform close to optimal in a static environmen

RERO DOC Digital Library